Search CORE

45 research outputs found

Internal Pattern Matching Queries in a Text and Applications

Author: Kociumaka Tomasz
Radoszewski Jakub
Rytter Wojciech
Waleń Tomasz
Publication venue
Publication date: 13/10/2014
Field of study

We consider several types of internal queries: questions about subwords of a text. As the main tool we develop an optimal data structure for the problem called here internal pattern matching. This data structure provides constant-time answers to queries about occurrences of one subword

x

in another subword

y

of a given text, assuming that

|y|=\mathcal{O}(|x|)

, which allows for a constant-space representation of all occurrences. This problem can be viewed as a natural extension of the well-studied pattern matching problem. The data structure has linear size and admits a linear-time construction algorithm. Using the solution to the internal pattern matching problem, we obtain very efficient data structures answering queries about: primitivity of subwords, periods of subwords, general substring compression, and cyclic equivalence of two subwords. All these results improve upon the best previously known counterparts. The linear construction time of our data structure also allows to improve the algorithm for finding

\delta

-subrepetitions in a text (a more general version of maximal repetitions, also called runs). For any fixed

\delta

we obtain the first linear-time algorithm, which matches the linear time complexity of the algorithm computing runs. Our data structure has already been used as a part of the efficient solutions for subword suffix rank & selection, as well as substring compression using Burrows-Wheeler transform composed with run-length encoding.Comment: 31 pages, 9 figures; accepted to SODA 201

arXiv.org e-Print Archive

Crossref

Faster Longest Common Extension Queries in Strings over General Alphabets

Author: Gawrychowski Paweł
Kociumaka Tomasz
Rytter Wojciech
Waleń Tomasz
Publication venue
Publication date: 01/01/2016
Field of study

Longest common extension queries (often called longest common prefix queries) constitute a fundamental building block in multiple string algorithms, for example computing runs and approximate pattern matching. We show that a sequence of

q

LCE queries for a string of size

n

over a general ordered alphabet can be realized in

O(q \log \log n+n\log^*n)

time making only

O(q+n)

symbol comparisons. Consequently, all runs in a string over a general ordered alphabet can be computed in

O(n \log \log n)

time making

O(n)

symbol comparisons. Our results improve upon a solution by Kosolobov (Information Processing Letters, 2016), who gave an algorithm with

O(n \log^{2/3} n)

running time and conjectured that

O(n)

time is possible. We make a significant progress towards resolving this conjecture. Our techniques extend to the case of general unordered alphabets, when the time increases to

O(q\log n + n\log^*n)

. The main tools are difference covers and the disjoint-sets data structure.Comment: Accepted to CPM 201

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

On the Greedy Algorithm for the Shortest Common Superstring Problem with Reversals

Author: Fici Gabriele
Kociumaka Tomasz
Radoszewski Jakub
Rytter Wojciech
Waleń Tomasz
Publication venue: 'Elsevier BV'
Publication date: 07/12/2015
Field of study

We study a variation of the classical Shortest Common Superstring (SCS) problem in which a shortest superstring of a finite set of strings

S

is sought containing as a factor every string of

S

or its reversal. We call this problem Shortest Common Superstring with Reversals (SCS-R). This problem has been introduced by Jiang et al., who designed a greedy-like algorithm with length approximation ratio

4

. In this paper, we show that a natural adaptation of the classical greedy algorithm for SCS has (optimal) compression ratio

\frac12

, i.e., the sum of the overlaps in the output string is at least half the sum of the overlaps in an optimal solution. We also provide a linear-time implementation of our algorithm.Comment: Published in Information Processing Letter

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università di Palermo

Approximating reversal distance for strings with bounded number of duplicates

Author: Kolman Petr
Waleń Tomasz
Publication venue: Elsevier B.V.
Publication date: 01/02/2007
Field of study

AbstractFor a string A=a1…an, a reversal ρ(i,j), 1⩽i⩽j⩽n, transforms the string A into a string A′=a1…ai-1ajaj-1…aiaj+1… an, that is, the reversal ρ(i,j) reverses the order of symbols in the substring ai…aj of A. In the case of signed strings, where each symbol is given a sign + or -, the reversal operation also flips the sign of each symbol in the reversed substring. Given two strings, A and B, signed or unsigned, sorting by reversals (SBR) is the problem of finding the minimum number of reversals that transform the string A into the string B.Traditionally, the problem was studied for permutations, that is, for strings in which every symbol appears exactly once. We consider a generalization of the problem, k-SBR, and allow each symbol to appear at most k times in each string, for some k⩾1. The main result of the paper is an O(k2)-approximation algorithm running in time O(n). For instances with 3<k⩽O(lognlog*n), this is the best known approximation algorithm for k-SBRand, moreover, it is faster than the previous best approximation algorithm

Elsevier - Publisher Connector

A Note on Efficient Computation of All Abelian Periods in a String

Author: Crochemore Maxime
Iliopoulos Costas
Kociumaka Tomasz
Kubica Marcin
Pachocki Jakub
Radoszewski Jakub
Rytter Wojciech
Tyczyński Wojciech
Waleń Tomasz
Publication venue
Publication date: 16/08/2012
Field of study

We derive a simple efficient algorithm for Abelian periods knowing all Abelian squares in a string. An efficient algorithm for the latter problem was given by Cummings and Smyth in 1997. By the way we show an alternative algorithm for Abelian squares. We also obtain a linear time algorithm finding all `long' Abelian periods. The aim of the paper is a (new) reduction of the problem of all Abelian periods to that of (already solved) all Abelian squares which provides new insight into both connected problems

arXiv.org e-Print Archive

King's Research Portal

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM